Fault Tolerant Scheduling for Parallel Loops on Shared Memory Systems
نویسندگان
چکیده
While multicore/multiprocessor systems achieve significant speedup for many applications by exploiting loop level parallelism, they also suffer from increased reliability problems as a result of ever scaling device size. This paper addresses the reliability of loop dominated applications, aiming to execute parallel loops efficiently in the presence of various types of hardware faults. In this paper, we present a fault tolerant work-stealing scheme which makes parallel loop execution resilient to hardware faults. A lightweight buffer-commit mechanism is applied in the proposed scheme to ensure the correctness of the re-execution of loop iterations. In addition, we split large failing chunks of loop iterations at runtime to improve load balancing, and a worker thread is discarded when faults occur frequently on it. We evaluated our techniques on a multi-socket multicore system, using a set of loop dominated benchmarks. The proposed scheme achieves the minimum overhead of supporting fault tolerance and optimal load balancing.
منابع مشابه
Parallel Processing on Networks of Workstations: A Fault-Tolerant, High Performance Approach
One of the most sought after software innovation of this decade is the construction of systems using off-the-shelf workstations that actually deliver, and even surpass, the power and reliability of supercomputers. Many researchers are using conventional techniques such as RPC, DSM, replication, causal communications and other techniques to provide parallel computing facilities on workstation ne...
متن کاملAdaptively Scheduling Parallel Loops in Distributed Shared-Memory Systems
Using runtime information of load distributions and processor affinity, we propose an adaptive scheduling algorithm and its variations from different control mechanisms. The proposed algorithm applies different degrees of aggressiveness to adjust loop scheduling granularities, aiming at improving the execution performance of parallel loops by making scheduling decisions that match the real work...
متن کاملExecuting Nested Parallel Loops on Shared-Memory Multiprocessors
Cache-coherent, bus-based shared-memory multiprocessors are a cost-e ective platform for parallel processing. In scienti c parallel applications, most of the computation involves processing of large multidimensional data structures which results in a high degree of data parallelism. This parallelism can be exploited in the form of nested parallel loops. Most existing shared memory multiprocesso...
متن کاملTS-PVM: a fault tolerant PVM extension for real time applications
In this research work, a fault tolerant extension of the de facto message passing system parallel virtual machine, TSparallel virtual machine, is introduced. This extension enables real time applications over parallel virtual machine. In PVM and similar message passing systems, if the message receiver is not available, due to network failure or machine crash, a failure is reported and data must...
متن کاملEXECUTING NESTED PARALLEL LOOPS ON SHARED - MEMORYMULTIPROCESSORSSadun
Cache-coherent, bus-based shared-memory multiprocessors are a cost-eeective platform for parallel processing. In scientiic parallel applications, most of the computation involves processing of large multidimensional data structures which results in a high degree of data parallelism. This parallelism can be exploited in the form of nested parallel loops. Most existing shared memory multiprocesso...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Inf. Sci. Eng.
دوره 31 شماره
صفحات -
تاریخ انتشار 2015